{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Highlights of the Code\n",
    "\n",
    "1. **Replication of Table A**  \n",
    "   This code replicates Table A from Internet Appendix C of the paper.\n",
    "\n",
    "<br>\n",
    "\n",
    "2. **Validating the Prediction Bands for Neural-Network-based Risk Premium Forecasts**  \n",
    "   Using Monte Carlo simulations, the code demonstrates that the paper's confidence intervals for\n",
    "   NN-based risk premium have accurate coverages. \n",
    "\n",
    "<br>\n",
    "\n",
    "3. **Validating the Prediction Bands for Neural-Network-based Risk Premium Forecasts**  \n",
    "   In addition, simulations confirm that the paper's prediction bands for the raw excess returns\n",
    "   are also correctly calibrated. This result validates the paper's prior choices and the \n",
    "   resultant posteriors for the factor loading coefficients. \n",
    "\n",
    "\n",
    "4. **Copyright Protection**  \n",
    "   The code is intended solely for research and non-commercial purposes.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Using TensorFlow backend.\n"
     ]
    }
   ],
   "source": [
    "import numpy as np # linear algebra\n",
    "import multiprocessing as mp\n",
    "import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)\n",
    "import seaborn as sns\n",
    "import numpy\n",
    "import pandas\n",
    "from joblib import Parallel, delayed\n",
    "import keras  \n",
    "from keras import regularizers\n",
    "from keras.models import Sequential\n",
    "from keras.layers import Dense\n",
    "from keras.wrappers.scikit_learn import KerasRegressor\n",
    "from sklearn.model_selection import cross_val_score\n",
    "from sklearn.model_selection import KFold\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "from sklearn.pipeline import Pipeline\n",
    "from keras.layers import BatchNormalization\n",
    "from keras.layers import Activation\n",
    "from keras import optimizers\n",
    "from sklearn.metrics import r2_score\n",
    "from keras import callbacks\n",
    "from keras.callbacks import EarlyStopping, ModelCheckpoint\n",
    "from keras.layers import Dropout\n",
    "from keras.models import load_model\n",
    "from sklearn.metrics import mean_squared_error\n",
    "from sklearn.linear_model import ElasticNet\n",
    "from numpy.linalg import inv\n",
    "from scipy import stats\n",
    "import scipy\n",
    "from scipy import stats\n",
    "from scipy.optimize import minimize\n",
    "from scipy.stats import invgauss\n",
    "from sklearn import linear_model\n",
    "#from scipy.stats import invgauss\n",
    "import scipy.stats as sc\n",
    "from numpy.linalg import matrix_rank\n",
    "from sklearn.linear_model import LinearRegression\n",
    "from portsort import portsort as ps\n",
    "#import gc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np # linear algebra\n",
    "import multiprocessing as mp\n",
    "import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)\n",
    "import seaborn as sns\n",
    "import numpy\n",
    "import pandas\n",
    "from joblib import Parallel, delayed\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import keras  \n",
    "from keras import regularizers\n",
    "from keras.models import Sequential\n",
    "from keras.layers import Dense\n",
    "from keras.wrappers.scikit_learn import KerasRegressor\n",
    "from sklearn.model_selection import cross_val_score\n",
    "from sklearn.model_selection import KFold\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "from sklearn.pipeline import Pipeline\n",
    "from keras.layers import BatchNormalization\n",
    "from keras.layers import Activation\n",
    "from keras import optimizers\n",
    "from sklearn.metrics import r2_score\n",
    "from keras import callbacks\n",
    "from keras.callbacks import EarlyStopping, ModelCheckpoint\n",
    "from keras.layers import Dropout\n",
    "from keras.models import load_model\n",
    "from sklearn.metrics import mean_squared_error"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The simulated data perfectly mimic the simulations of Gu, Kelly, and Xiu (2020) (GKX, henceforth), except for the specification on the model residuals, which I model using a factor model. I thank GKX for making their Matlab codes publicly available, which I adapt (and attached separately) to obtain the simulated data.   "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "def rsq_oos(y,ypred): \n",
    "    SSR = np.sum((y-ypred)**2)\n",
    "    MSS = np.sum((y)**2)\n",
    "    r_squared =1-SSR/MSS \n",
    "    return r_squared"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "class MCDropout(keras.layers.Dropout):\n",
    "    def call(self,inputs):\n",
    "        return super().call(inputs,training=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "###This code models data using a 3-layer NN with dropout & l2 regualrizations. The code randomly drops Neurons \n",
    "#while making predictions \n",
    "\n",
    "\n",
    "def pred_model(xtrain,ytrain,lrpar,l1par,xtest,ytest):\n",
    "    model = Sequential()\n",
    "    model.add(MCDropout(0.1, input_shape=(200,)))\n",
    "    \n",
    "    model.add(Dense(32, input_dim=200, kernel_regularizer=regularizers.l2(l1par)))#,\n",
    "    model.add(MCDropout(0.1, input_shape=(200,)))\n",
    "    model.add(Activation('relu'))\n",
    "    \n",
    "    \n",
    "    model.add(Dense(16, input_dim=200, kernel_regularizer=regularizers.l2(l1par)))#,\n",
    "    model.add(MCDropout(0.1, input_shape=(200,)))\n",
    "    model.add(Activation('relu'))\n",
    "    \n",
    "    \n",
    "    model.add(Dense(8, input_dim=200, kernel_regularizer=regularizers.l2(l1par)))#,\n",
    "    model.add(MCDropout(0.1, input_shape=(200,)))\n",
    "    model.add(Activation('relu'))\n",
    "    \n",
    "    model.add(Dense(1, kernel_initializer='normal'))\n",
    "    adam=optimizers.Adam(lr=lrpar)\n",
    "    model.compile(loss='mean_squared_error', optimizer=adam)\n",
    "   \n",
    "    \n",
    "    \n",
    "    model.fit(xtrain,ytrain,epochs=200,verbose=0) \n",
    "    \n",
    "    return model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "###This function models data using a 3-layer NN with dropout & l2 regualrizations. But this code does not\n",
    "## drop neurons while making predictions \n",
    "def pred_modeld():\n",
    "    model = Sequential()\n",
    "    model.add(Dropout(0.1, input_shape=(200,)))\n",
    "\n",
    "    model.add(Dense(32, input_dim=200, kernel_regularizer=regularizers.l2(l1par)))\n",
    "    model.add(Dropout(0.1, input_shape=(200,)))\n",
    "    model.add(Activation('relu'))\n",
    "    \n",
    "    model.add(Dense(16, input_dim=200, kernel_regularizer=regularizers.l2(l1par)))#,\n",
    "    model.add(Dropout(0.1, input_shape=(200,)))\n",
    "    model.add(Activation('relu'))\n",
    "    \n",
    "    \n",
    "    model.add(Dense(8, input_dim=200, kernel_regularizer=regularizers.l2(l1par)))#,\n",
    "    model.add(Dropout(0.1, input_shape=(200,)))\n",
    "    model.add(Activation('relu'))\n",
    "    \n",
    "    model.add(Dense(1, kernel_initializer='normal'))\n",
    "    adam=optimizers.Adam(lr=lrpar)\n",
    "    model.compile(loss='mean_squared_error', optimizer=adam)\n",
    "    \n",
    "    \n",
    "    return model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "#l1par=[0.00002]\n",
    "#lrpar=[0.00002]\n",
    "\n",
    "l1par=[0.00001]\n",
    "lrpar=[0.0003]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "l1par1=[0.00001]\n",
    "lrpar1=[0.0003]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "import random\n",
    "import tensorflow as tf\n",
    "seed=0\n",
    "\n",
    "random.seed(seed)\n",
    "np.random.seed(seed)\n",
    "tf.random.set_seed(seed)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "def confidence_levels():\n",
    "    risk_premium_coverages=np.zeros((100,10))\n",
    "    raw_return_coverages=np.zeros((100,10))\n",
    "    prob_95 = np.zeros(100)\n",
    "    prob_90 = np.zeros(100)\n",
    "    prob_80 = np.zeros(100)\n",
    "    prob_70=np.zeros(100)\n",
    "    prob_60=np.zeros(100)\n",
    "    prob_50=np.zeros(100)\n",
    "    prob_40=np.zeros(100)\n",
    "    prob_30=np.zeros(100)\n",
    "    prob_20=np.zeros(100)\n",
    "    prob_10=np.zeros(100)\n",
    "    prob_raw_95 = np.zeros(100)\n",
    "    prob_raw_90 = np.zeros(100)\n",
    "    prob_raw_80 = np.zeros(100)\n",
    "    prob_raw_70=np.zeros(100)\n",
    "    prob_raw_60=np.zeros(100)\n",
    "    prob_raw_50=np.zeros(100)\n",
    "    prob_raw_40=np.zeros(100)\n",
    "    prob_raw_30=np.zeros(100)\n",
    "    prob_raw_20=np.zeros(100)\n",
    "    prob_raw_10=np.zeros(100)\n",
    "    \n",
    "    \n",
    "    for M in range(1,101):\n",
    "        c=pd.read_csv('D:/bayesian jmp/Simunonlin/SimuData_p100/c{}.csv'.format(M),header=None)\n",
    "        r1=pd.read_csv('D:/bayesian jmp/Simunonlin/SimuData_p100/r1_{}.csv'.format(M),header=None)\n",
    "        f1=pd.read_csv('D:/bayesian jmp/Simunonlin/SimuData_p100/f1_{}.csv'.format(M),header=None)   \n",
    "\n",
    "        er=pd.read_csv('D:/bayesian jmp/Simunonlin/SimuData_p100/er1_{}.csv'.format(M),header=None)\n",
    "        er=er.iloc[24000:36000]\n",
    "        er.columns=['TrueSimulatedER']\n",
    "        er=er.reset_index().drop(columns='index')\n",
    "\n",
    "\n",
    "\n",
    "        xtrain=c.iloc[0:12000,:] ##TRaining data\n",
    "        xtest=c.iloc[12000:24000,:]\n",
    "        xoos=c.iloc[24000:36000,:]\n",
    "\n",
    "\n",
    "        ytrain=r1.iloc[0:12000:,0]\n",
    "        ytest=r1.iloc[12000:24000,0]\n",
    "        yoos=r1.iloc[24000:36000,0]\n",
    "        modeld=pred_model(xtrain,ytrain,lrpar,l1par,xtest,ytest)\n",
    "\n",
    "       \n",
    "        mytestmodel=pred_modeld()\n",
    "        mytestmodel.set_weights(modeld.get_weights())\n",
    "        ytrainpred=mytestmodel.predict(xtrain).ravel()\n",
    "        #ytestpred=mytestmodel.predict(xtest).ravel()\n",
    "        errortrain=ytrain-ytrainpred\n",
    "\n",
    "        xerror=f1.iloc[0:12000,:]\n",
    "        xterror=f1.iloc[12000:24000,:]\n",
    "        xooserror=c.iloc[24000:36000,0:2]\n",
    "        \n",
    "        \n",
    "        errorscale=np.std(xerror)[0]\n",
    "        \n",
    "        \n",
    "        #xerror=(xerror-np.mean(xerror))/np.std(xerror)\n",
    "        #xterror=(xterror-np.mean(xterror))/np.std(xterror)\n",
    "        reg = LinearRegression(fit_intercept=False).fit(xerror, errortrain)\n",
    "        errorfit=reg.predict(xerror)\n",
    "        xyerror=xerror.transpose().dot(errortrain)\n",
    "        errorvar=np.var(errortrain-errorfit)\n",
    "\n",
    "        Aerror= xerror.transpose().dot(xerror)\n",
    "        Aerrorinv=np.linalg.inv(Aerror)\n",
    "\n",
    "\n",
    "        errorbetain=Aerrorinv.dot(xyerror)\n",
    "        factor_std=np.std(xerror.iloc[:,0]/(c.iloc[0:12000,0])) ##Estimating the factor variance\n",
    "        confid=[]\n",
    "        predmean=[]\n",
    "        for j in range (5):\n",
    "            errorbetains=np.random.multivariate_normal(errorbetain,0.1*Aerrorinv)\n",
    "            errorcor=xerror.dot(errorbetains) ## Parameter c in the paper \n",
    "        \n",
    "            errortcor=xterror.dot(errorbetains)\n",
    "            ytrainerror=ytrain-errorcor\n",
    "            ytesterror=ytest-errortcor\n",
    "            modelnd=pred_model(xtrain,ytrainerror,lrpar1,l1par1,xtest,ytesterror)\n",
    "            yoos_pred_st=np.stack([modelnd.predict(xoos) for sample in range(100)]) ## Code for obtaining CIs\n",
    "            confid.append(yoos_pred_st.std(axis=0))\n",
    "            predmean.append(yoos_pred_st.mean(axis=0))\n",
    "            \n",
    "            \n",
    "            \n",
    "            \n",
    "            \n",
    "            \n",
    "        \n",
    "        confide=np.square(confid)\n",
    "        confiden=np.sqrt(np.var(predmean,axis=0)+np.mean(confide,axis=0))\n",
    "        yoosbayes=np.mean(predmean,axis=0)\n",
    "        yoosconf=pd.DataFrame(confiden, columns = ['yoosconf'])\n",
    "        yoosbayes=pd.DataFrame(yoosbayes, columns = ['yoosbayes'])\n",
    "        \n",
    "        ytest_pred_st=np.stack([modelnd.predict(xtest) for sample in range(100)]) ## dropout-based risk premium forecasts on test sample\n",
    "        errorstd=np.std(ytest.reset_index(drop=True)-xterror.dot(errorbetain).reset_index(drop=True)-pd.DataFrame(ytest_pred_st.mean(axis=0)).iloc[:,0])\n",
    "        ##Estimated residual standard error (Var(epsilon))\n",
    "        \n",
    "        \n",
    "        xooserror=(c.iloc[24000:36000,0:2]).reset_index().drop(columns=['index'])\n",
    "        yooserrorstd=np.sqrt((factor_std*xooserror.dot(errorbetain))**2+(errorstd**2)) ##Forecast variance attributable to model residuals\n",
    "                                                \n",
    "            \n",
    "        yooserrorstd=pd.DataFrame(yooserrorstd, columns = ['yooserrorstd'])\n",
    "        \n",
    "        returns=pd.DataFrame(yoos)\n",
    "        returns.columns=['returns']\n",
    "        returns=returns.reset_index().reset_index().drop(columns='index')\n",
    "        Time=pd.DataFrame(np.repeat(range(60),200),columns=['Time'])\n",
    "        stockid=list(range(200))\n",
    "        Stockid=pd.DataFrame(stockid*60,columns=['StockID'])\n",
    "\n",
    "        final_data=pd.concat([Time,returns,er, yoosbayes,yoosconf,yooserrorstd,Stockid], axis=1)\n",
    "        final_data['tstat']=np.abs(final_data['TrueSimulatedER']-final_data['yoosbayes'])/final_data['yoosconf']\n",
    "        \n",
    "        final_data['totalerrorstd']=(final_data['yooserrorstd']**2+final_data['yoosconf']**2)**0.5\n",
    "        \n",
    "        final_data['rawtstat']=np.abs(final_data['returns']-final_data['yoosbayes'])/final_data['totalerrorstd']\n",
    "\n",
    "        \n",
    "        \n",
    "        prob_95[M-1]=(final_data.loc[final_data['tstat'] <=1.96].shape[0]/12000) # 95\n",
    "        prob_90[M-1]=(final_data.loc[final_data['tstat'] <=1.65].shape[0]/12000) # 90\n",
    "        prob_80[M-1]=(final_data.loc[final_data['tstat'] <=1.282].shape[0]/12000) # 80\n",
    "        prob_70[M-1]=(final_data.loc[final_data['tstat'] <=1.036].shape[0]/12000) # 70\n",
    "        prob_60[M-1]=(final_data.loc[final_data['tstat'] <=0.842].shape[0]/12000) # 60\n",
    "\n",
    "        prob_50[M-1]=(final_data.loc[final_data['tstat'] <=0.674].shape[0]/12000) # 50\n",
    "        prob_40[M-1]=(final_data.loc[final_data['tstat'] <=0.524].shape[0]/12000) # 40\n",
    "        prob_30[M-1]=(final_data.loc[final_data['tstat'] <=0.385].shape[0]/12000) # 30\n",
    "\n",
    "        prob_20[M-1]=(final_data.loc[final_data['tstat'] <=0.253].shape[0]/12000) # 20\n",
    "        prob_10[M-1]=(final_data.loc[final_data['tstat'] <=0.126].shape[0]/12000) # 10\n",
    "        \n",
    "        \n",
    "        \n",
    "        \n",
    "                \n",
    "        prob_raw_95[M-1]=(final_data.loc[final_data['rawtstat'] <=1.96].shape[0]/12000) # 95-percentile based on t(5) dist\n",
    "        prob_raw_90[M-1]=(final_data.loc[final_data['rawtstat'] <=1.65].shape[0]/12000) # 90\n",
    "        prob_raw_80[M-1]=(final_data.loc[final_data['rawtstat'] <=1.282].shape[0]/12000) # 80\n",
    "        prob_raw_70[M-1]=(final_data.loc[final_data['rawtstat'] <=1.036].shape[0]/12000) # 70\n",
    "        prob_raw_60[M-1]=(final_data.loc[final_data['rawtstat'] <=0.842].shape[0]/12000) # 60\n",
    "\n",
    "        prob_raw_50[M-1]=(final_data.loc[final_data['rawtstat'] <=0.674].shape[0]/12000) # 50\n",
    "        prob_raw_40[M-1]=(final_data.loc[final_data['rawtstat'] <=0.524].shape[0]/12000) # 40\n",
    "        prob_raw_30[M-1]=(final_data.loc[final_data['rawtstat'] <=0.385].shape[0]/12000) # 30\n",
    "\n",
    "        prob_raw_20[M-1]=(final_data.loc[final_data['rawtstat'] <=0.253].shape[0]/12000) # 20\n",
    "        prob_raw_10[M-1]=(final_data.loc[final_data['rawtstat'] <=0.126].shape[0]/12000) # 10\n",
    "        \n",
    "        risk_premium_coverages[M-1,:]=[prob_95[M-1], prob_90[M-1], prob_80[M-1],prob_70[M-1],prob_60[M-1],prob_50[M-1],prob_40[M-1],prob_30[M-1],prob_20[M-1],prob_10[M-1]]\n",
    "        raw_return_coverages[M-1,:]=[prob_raw_95[M-1], prob_raw_90[M-1], prob_raw_80[M-1],prob_raw_70[M-1],prob_raw_60[M-1],prob_raw_50[M-1],prob_raw_40[M-1],prob_raw_30[M-1],prob_raw_20[M-1],prob_raw_10[M-1]]\n",
    "\n",
    "    \n",
    "      \n",
    "    return risk_premium_coverages,raw_return_coverages\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "risk_premium_coverages,raw_return_coverages=confidence_levels()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0.92553667, 0.8754325 , 0.77514917, 0.67598167, 0.5783625 ,\n",
       "       0.48032833, 0.38342917, 0.2873775 , 0.19128917, 0.09613   ])"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.mean(risk_premium_coverages[range(100)],axis=0) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0.95209083, 0.91834167, 0.84251583, 0.75843417, 0.66719667,\n",
       "       0.56737833, 0.46151917, 0.35080083, 0.23573   , 0.11856167])"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.mean(raw_return_coverages[range(100)],axis=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
